{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training an agent to walk using TRPO\n",
    "In this section, let's learn how to train the agent to walk using Trust Region Policy\n",
    "Optimization (TRPO). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gym\n",
    "from stable_baselines.common.policies import MlpPolicy\n",
    "from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize\n",
    "from stable_baselines import TRPO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a vectorized Humanoid environment using `DummyVecEnv`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = DummyVecEnv([lambda: gym.make(\"Humanoid-v2\")])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Normalize the states (observations):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = VecNormalize(env, norm_obs=True, norm_reward=False,\n",
    "                   clip_obs=10.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instantiate the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = TRPO(MlpPolicy, env)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can train the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent.learn(total_timesteps=25000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also have a look at how our trained agent performs in the environment:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "state = env.reset()\n",
    "while True:\n",
    "    action, _ = agent.predict(state)\n",
    "    next_state, reward, done, info = env.step(action)\n",
    "    state = next_state\n",
    "    env.render()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Save the whole code used in this section in a python file called trpo.py and then open\n",
    "terminal and run the file:\n",
    "\n",
    "`python trpo.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recording the video\n",
    "\n",
    "In the previous section, we trained our agent to learn to walk using TRPO. Can we also\n",
    "record the video of our trained agent? Yes! With stable baselines, we can easily record a\n",
    "video of our agent using the VecVideoRecorder module.\n",
    "Note that to record the video, we need the ffmpeg package installed in our machine. If it is\n",
    "not installed then install that using the following set of commands:\n",
    "\n",
    "\n",
    "`sudo add-apt-repository ppa:mc3man/trusty-media\n",
    " sudo apt-get update\n",
    " sudo apt-get dist-upgrade\n",
    " sudo apt-get install ffmpeg`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's import the `VecVideoRecorder` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from stable_baselines.common.vec_env import VecVideoRecorder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define a function called `record_video` for recording the video:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def record_video(env_name, agent, video_length=500, prefix='', video_folder='videos/'):\n",
    "    \n",
    "    #create the environment\n",
    "    env = DummyVecEnv([lambda: gym.make(env_name)])\n",
    "    \n",
    "    #instantiate the video recorder\n",
    "    env = VecVideoRecorder(env, video_folder=video_folder,\n",
    "        record_video_trigger=lambda step: step == 0, video_length=video_length, name_prefix=prefix)\n",
    "\n",
    "    #select actions in the environment using our trained agent where the number of time steps is\n",
    "    #set to video length:\n",
    "    state = env.reset()\n",
    "    for _ in range(video_length):\n",
    "        action, _ = agent.predict(state)\n",
    "        next_state, reward, done, info = env.step(action)\n",
    "        state = next_state\n",
    "\n",
    "    env.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it! Now, let's call our `record_video` function. Note that we are passing the\n",
    "environment name, our trained agent, length of the video and the name of our video file: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "record_video('Humanoid-v2', agent, video_length=500, prefix='Humanoid_walk_TRPO')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we will have a new file called `Humanoid_walk_TRPO-step-0-to-step-500.mp4` in\n",
    "the `folder` videos."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
